In This Talk

Will walk you through these steps with a dataset

Find these slides @ https://bit.ly/2lyGAqr

My Background

Orignally a worm biologist, now bioinformatician @ Monash Bioinformatics Platform, more recently R-Ladies Melbourne organiser

This talk can be considered ‘Most Useful Things Worm Adele Would Have Liked to Have Known When Starting Out in R’

R & Things I Like About It

Programming language for statistical computing and graphics

Has lots of plotting functionality and well geared towards data analysis out of box with in-built statistical tests

Well developed ecosystem of software packages that further expands base R for analysis, project management, visualisation, document generation, etc

Continous active development

Thorough documentation

R Markdown

The marriage between Markdown, a lightweight markup language and R, a programming language for statistics

An R Markdown file is a plain text document that allows you to embed R code chunks + plain text notes & images & videos.

Structure:

  1. YAML header - The meta-data that describes the final document output
  2. Markdown section - content/body of the document - your text/notes, images, links, etc
  3. Code chunks - where the R* code goes

An R Markdown file by itself is quite simple but is neatly rendered into a more complicated document type

*actually supports up to 52 language engines including Python, Julia, C++, MySQL, bash, etc

R Markdown & Analysis Reproducibility

Document what you’ve done with your data in code

R Markdown can render multiple different document types from one Rmd file

The more places (files) an analysis is spread across, the more work it is to keep all of it accurate and up-to-date.

Rendering a R Markdown document will re-execute all code chunks, if your code doesn’t work, it’ll fail

R Markdown allows you to focus on generating content & doing your analysis without (hopefully) spending too much time fighting your document itself*

*the more a document is geared towards a particular output type, the harder it is to neatly convert between formats

YAML header


title: "Rmarkdown Quickstart"
author: "Adele Barugahare"
date: "27/08/2019"
output: 
  ioslides_presentation:
    df_print: "paged"
  html_document:
    df_print: "paged"
    toc: true
    toc_depth: 2
  pdf_document:
    number_sections: true
    df_print: "kable"

Code chunks

` ```{r, chunk_options}` `

#Code analysis goes here
x <- 1:10
y <- x * 2

plot(x, y)
etc

` ``` `

R Markdown Let’s Play

Create an account at R Studio Cloud

Link to my github: https://bit.ly/2lyGAqr

OR

Github account name: aabarug

Repo: quickstart_rmarkdown_sept_2019

The repo readme has a link to this workspace - you should be able to copy that into a workspace of your own

R Data Analysis Toolbox

  • Tidyverse - an opinionated collection of R packages designed for data science: ggplot2, dplyr, magrittr, tidyr, readr, etc
    • ggplot2 - extensive plotting package
  • Shiny - build interactive web applications/dashboards
  • R Markdown - document generation

Tabular data is represented as data-frames - in-built class

Tidyverse

  • ‘Modern’ way of writing R and geared at data science
  • Fixes up some quicky behaviour from base R
  • Improved data-frames - tibbles

Tidy data:

  1. Each variable is in a column.
  2. Each observation is a row.
  3. Each value is a cell.

dplyr

  • is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges
  • each verb takes a data frame as input and returns a modified version of it
  • the idea is that complex operations can be performed by stringing together a series of simpler operations in a pipeline.
input       +--------+        +--------+        +--------+      result
data   %>%  |  verb  |  %>%   |  verb  |  %>%   |  verb  |  ->  data
frame       +--------+        +--------+        +--------+      frame

%>% - Pipe symbol that passes output from one function to another (Magrittr package)

Dataset > Manipulate to extract information > Plot > Communicate

Read in data:

## ── Attaching packages ───────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.0     ✔ purrr   0.3.2
## ✔ tibble  2.1.1     ✔ dplyr   0.8.1
## ✔ tidyr   0.8.3     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ──────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## Parsed with column specification:
## cols(
##   Route = col_character(),
##   Departing_Port = col_character(),
##   Arriving_Port = col_character(),
##   Airline = col_character(),
##   Month = col_double(),
##   Sectors_Scheduled = col_double(),
##   Sectors_Flown = col_double(),
##   Cancellations = col_double(),
##   Departures_On_Time = col_double(),
##   Arrivals_On_Time = col_double(),
##   Departures_Delayed = col_double(),
##   Arrivals_Delayed = col_double(),
##   Year = col_double(),
##   Month_Num = col_double()
## )

The dataset:

Australian domestic airlines on time dataset with information from 2004 to 2019 - 80615 rows and 14 columns from http://data.gov.au/

Dataset > Manipulate to extract information > Plot > Communicate

Extract route ‘Adelaide-Brisbane’ in the year 2008 & fix up the month column

Dataset > Manipulate to extract information > Plot > Communicate

ggplot2

Implements the grammar of graphics, a coherent system for describing and building graphs

Takes a data-frame input, describes which columns maps to which aethestics and then builds a plot by layering ‘geoms’.

Geoms define the type of plot the data should be displayed as.

The top level aethestics & data will be passed on to all geoms but can be overriden by specificing new data/aethestics to that specific geom

ggplot2 predates Magrittr hences it uses + instead of %>%

No geom layer - data is not displayed but a plot is generated

You can keep adding geoms…

Packages For Interactivity

Plotly

Or use it as a wrapper to a ggplot object with ggplotly

Dataset > Manipulate to extract information > Plot > Communicate

R Markdown Outputs

Supported Documents Outputs

  • webpages
  • R-notebooks
  • PDFs
  • Slideshows
  • Books
  • Websites

…and more created by the R community

Packages that further build on top of R Markdown

  • blogdown - combines R Markdown & Hugo to create general purpose websites
  • bookdown - authoring books, thesis, sfotware manuals, etc
  • flexdashboards - HTML outputs with dashboard layouts
  • xaringan - slides shows with remark.js

Neat Examples